AITopics | hubness reduction

Collaborating Authors

hubness reduction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance

Peter, Silvan David, Chowdhury, Shreyan, Cancino-Chacón, Carlos Eduardo, Widmer, Gerhard

arXiv.org Artificial IntelligenceDec-31-2023

Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement.

characterization, reduction, similarity, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3632754.3632759

2401.02979

Country:

Asia > India (0.06)
Europe > Austria > Upper Austria > Linz (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Hubness Reduction Improves Sentence-BERT Semantic Spaces

Nielsen, Beatrix M. G., Hansen, Lars Kai

arXiv.org Artificial IntelligenceNov-30-2023

Semantic representations of text, i.e. representations of natural language which capture meaning by geometry, are essential for areas such as information retrieval and document grouping. High-dimensional trained dense vectors have received much attention in recent years as such representations. We investigate the structure of semantic spaces that arise from embeddings made with Sentence-BERT and find that the representations suffer from a well-known problem in high dimensions called hubness. Hubness results in asymmetric neighborhood relations, such that some texts (the hubs) are neighbours of many other texts while most texts (so-called anti-hubs), are neighbours of few or no other texts. We quantify the semantic quality of the embeddings using hubness scores and error rate of a neighbourhood based classifier. We find that when hubness is high, we can reduce error rate and hubness using hubness reduction methods. We identify a combination of two methods as resulting in the best reduction. For example, on one of the tested pretrained models, this combined method can reduce hubness by about 75% and error rate by about 9%. Thus, we argue that mitigating hubness in the embedding space provides better semantic representations of text.

base 0, local scaling 0, mp 0, (14 more...)

arXiv.org Artificial Intelligence

2311.18364

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

scikit-hubness: Hubness Reduction and Approximate Neighbor Search

Feldbauer, Roman, Rattei, Thomas, Flexer, Arthur

arXiv.org Machine LearningDec-2-2019

This paper introduces scikit-hubness, a Python package for efficient nearest neighbor search in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality, and is known to impair various learning tasks, including classification, clustering, and visualization. scikit-hubness provides algorithms for hubness analysis ("Is my data affected by hubness?"), hubness reduction ("How can we improve neighbor retrieval in high dimensions?"), and approximate neighbor search ("Does it work for large data sets?"). It is integrated into the scikit-learn environment, enabling rapid adoption by Python-based machine learning researchers and practitioners. Users will find all functionality of the scikit-learn neighbors package, plus additional support for transparent hubness reduction and approximate nearest neighbor search. scikit-hubness is developed using several quality assessment tools and principles, such as PEP8 compliance, unit tests with high code coverage, continuous integration on all major platforms (Linux, MacOS, Windows), and additional checks by LGTM. The source code is available at https://github.com/VarIr/scikit-hubness under the BSD 3-clause license. Install from the Python package index with $ pip install scikit-hubness.

hubness reduction, neighbor, reduction, (12 more...)

arXiv.org Machine Learning

1912.00706

Country: Europe > Austria > Vienna (0.15)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback